PanDoRA: A Large-scale Two-way Statistical Machine Translation System for Hand-held Devices

نویسندگان

  • Ying Zhang
  • Stephan Vogel
چکیده

The statistical machine translation (SMT) approach has taken a lead place in the field of Machine Translation for its better translation quality and lower cost in training compared to other approaches. However, due to the high demand of computing resources, an SMT system can not be directly run on hand-held devices. Most existing hand-held translation systems are either interlingua-based, which require non-trivial human efforts to write grammar rules, or using the client/server architecture, which are constrained by the availability of wireless connections. In this paper we present PanDoRA, a two-way phrase-based statistical machine translation system for stand-alone hand-held devices. Powered by special designs such as integerized computation and compact data structure, PanDoRA can translate dialogue speech on off-the-shelf PDAs in real time. PanDoRA uses 64K words vocabulary and millions of phrase pairs for each translation directions. To our knowledge, PanDoRA is the first large-scale SMT system with build-in reordering models running on hand-held devices. We have successfully developed several speech-to-speech translation systems using PanDoRA and our experiments show that PanDoRA's translation quality is comparable to that of the state-of-the-art phrase-based statistical machine translation systems such as Pharaoh and STTK.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

بهبود و توسعه یک سیستم مترجم‌یار انگلیسی به فارسی

In recent years, significant improvements have been achieved in statistical machine translation (SMT), but still even the best machine translation technology is far from replacing or even competing with human translators. Another way to increase the productivity of the translation process is computer-assisted translation (CAT) system. In a CAT system, the human translator begins to type the tra...

متن کامل

Two-way speech-to-speech translation on handheld devices

This paper presents a two-way speech translation system that is completely hosted on an off-the-shelf handheld device. Specifically, this end-to-end system includes an HMM-based large vocabulary continuous speech recognizer (LVCSR) for both English and Chinese using statistical -grams, a two-way translation system between English and Chinese, and, a multilingual speech synthesis system that out...

متن کامل

Large-Scale Discriminative Training for Statistical Machine Translation Using Held-Out Line Search

We introduce a new large-scale discriminative learning algorithm for machine translation that is capable of learning parameters in models with extremely sparse features. To ensure their reliable estimation and to prevent overfitting, we use a two-phase learning algorithm. First, the contribution of individual sparse features is estimated using large amounts of parallel data. Second, a small dev...

متن کامل

Experiments on Domain Adaptation for English--Hindi SMT

Statistical Machine Translation (SMT) systems are usually trained on large amounts of bilingual text and monolingual target language text. If a significant amount of out-of-domain data is added to the training data, the quality of translation can drop. On the other hand, training an SMT system on a small amount of training material for given indomain data leads to narrow lexical coverage which ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007